FE-883: Orchestrator operational improvements (serve reliability + Opus 4.8) by kostandinang · Pull Request #224 · hashintel/brunch

kostandinang · 2026-06-16T14:18:21Z

Stack Context

Stacks on FE-878 (#222). FE-883 — operational improvements under the FE-864 brownfield-orchestration umbrella (planning PR #212). No longer the single timeout tweak it started as — this branch collects the operational improvements that make brunch serve usable on real brownfield runs.

What?

Cook agent action timeout → 600s — long real-repo actions no longer abort prematurely.
Orchestration on Opus 4.8 — interviewer, cook agents, and the architect default (DEFAULT_ARCHITECT_MODEL_ID) all run claude-opus-4-8; the interviewer uses adaptive thinking + effort, and a fix stops 4.8 fragmenting ask_question into failed tool parts.
Slice grid fidelity — fail rows when the writer pi aborts (including verify-epic), keep the grid current during verification, surface the failure reason + a pinned halt summary, show per-slice attempt count, and a "not-started" evaluate gate so a greenfield slice isn't painted ✗ before its tests exist.
serve --land — merge the promoted cook branch into the active branch as the final serve step; greenfield runs (no repo branch to land onto) warn instead of silently ignoring the flag.
Run-artifact composer — a git-merge-based run artifact (run-artifact.ts / run-refs.ts) composes the completed run.
Epic-merge dep resolution — the merged __epic__ tree re-links the parent's symlinked node_modules so verify-epic and probes have resolvable deps.
Dev-server quiet — stop the dev server reloading on .brunch cook worktrees.
Presentation polish — recipe banner lexicon, exact out-dir reporting, verbose warnings.

Why?

The spec-23 run exposed serve reliability as a broad issue, not a single timeout: actions timed out on real repos, the architect's model default produced weaker fallback plans that amplified slice counts, and slice failures — especially writer and epic-verify aborts — vanished from the grid with no reason. This branch is the FE-864 umbrella's implementation seam for those concrete operational fixes.

Tests

Per-seam unit coverage for the changes above: serve --land arg mapping + greenfield warning, verify-epic failed-row emission, the epic-merge node_modules re-link, the architect default model id, the run-artifact composer, and the slice-grid presenter.

cursor · 2026-06-16T14:18:31Z

PR Summary

Medium Risk
Touches brownfield git promotion, optional merge onto the user’s active branch, and widespread ref renames—behavior changes are mostly covered by tests but real-repo edge cases (conflicts, stale worktrees) warrant careful manual verification.

Overview
This PR broadens FE-864 from a timeout tweak into orchestration and UX changes aimed at real brownfield brunch serve runs.

Cook execution and sandboxes: Per-slice git worktrees are provisioned lazily on first use (ensureSliceWorktree) instead of seeding every slice up front. node_modules is symlinked from the parent worktree (and re-linked after epic merges) rather than CoW-copied per slice. Worktree setup runs git worktree prune before adds. The pi agent timeout rises to 600s; cook rejects unknown CLI flags and throws on plan/Petrinaut errors so the Ink presenter can unmount cleanly.

Git artifacts: Run/slice refs move from cook/… / cook-slice/… to centralized brunch/run/<id> and brunch/slice/<id>/<slice> via brunchRef. landCookBranch optionally merges a promoted run branch onto the active branch ( serve --land → landBranch; refuses dirty/detached HEAD, aborts on conflict). New run-artifact (harvestCookRun, merge-tree fold) adds per-slice git history with fail-closed conflicts (tests only in this diff—not yet wired into the main promote path).

Live CLI: The event bus gains run-shape, slice, and cook-done. The Ink TUI shows an epic/slice grid (steps, retries, failure reasons, halt summary), tool-call heartbeats (instrumented pi tools + latest-line progress), a figlet wordmark in scrollback via <Static>, and brigade serve when a run completes successfully. evaluate-done treats “no test files” as absent so greenfield gates don’t paint phantom failures; runner output strips [agent-tail] noise.

Defaults and misc: Architect/cook models default to Opus (claude-opus-4-8 / docs claude-opus-4-6). Vite dev ignores .brunch/**. Chat API accepts follow-up history with output-error tool parts.

^{Reviewed by Cursor Bugbot for commit 21e1653. Bugbot is set up for automated code reviews on this repo. Configure here.}

kostandinang · 2026-06-16T14:18:41Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Each cook agent action (write-tests, write-code, verify-epic) runs under a per-action wall-clock budget enforced in pi-actions.ts. Raise the default from 300s to 600s so Sonnet agents have headroom on larger slices and on brownfield repos where setup/discovery eats into the turn. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…eam, clean failures Iterating on the live TUI from real-terminal feedback: - One global run timer in the footer instead of a per-item clock on every pending row (and whole-second, no jittery decimals). - "brunch" wordmark is now a big lowercase figlet (Slant) in a warm orange gradient, replacing the egg. - Activity log + wordmark stream through Ink <Static> so the full run lands in scrollback instead of collapsing in a redrawn bounded box; line cap removed. - Brigade tracker no longer lights "taste" mid-cook — per-slice verify actions fire during cooking, so taste stays unlit until a real end-of-cook signal. - Failures throw instead of process.exit, so withCookBus disposes (unmounts Ink) before the error prints — no more frozen "prep ◐" hang. cook validates args before mounting the TUI and rejects unknown flags (e.g. --spec-id). check + presenter/cook/pi-actions tests green; full build deferred (active graphite stack navigation). Co-Authored-By: Claude <noreply@anthropic.com>

Wires the two remaining kitchen-brigade phases faithfully to the orchestrator-arcs mapping (verify→taste, ship→serve): - taste lights on the epic-verification verdict (action `epic <id> → …`), not on per-slice `verify <target>` lines — those fire mid-cook and previously lit taste while still cooking. - serve lights on a new `cook-done` event emitted at the end of runCook (after promotion); a halted run never ships, so it never lights serve. phase.test covers both signals + the full prep→cook→taste→plate→serve walk; check + presenter/cook tests green. Co-Authored-By: Claude <noreply@anthropic.com>

Each cook pi session was a black box in the pending panel — just a KB count. runPi already subscribes to the session's text stream; instead of bytes, surface the agent's latest non-empty line (tail-truncated, throttled every 2 KB) as the activity-progress detail, so a wait reads as live work ("agent writing tests · …adds the RefreshToken guard") rather than "still going". Kept headless createAgentSession — no pi InteractiveMode, no new pi API: pi's tool-call events come via an extension hook (on('tool_call')), not the subscribe stream, so a richer "editing <file> / running <tool>" heartbeat is a separate follow-up that needs the extension-registration path verified. check + pi-actions/presenter tests green. Co-Authored-By: Claude <noreply@anthropic.com>

Richer "what the agent is doing" in the pending panel (the spike's Option A, full tier): instead of only the agent's latest line, show the tool calls — "edit src/auth/token.ts", "bash bun test", "grep RefreshToken". pi exposes no tool-call hook on session.subscribe (text/lifecycle only), so buildSessionOptions now supplies the built-in tools itself via customTools + noTools:'builtin': each createXToolDefinition(cwd) is wrapped to emit a label from its params, then delegates unchanged. The builders bake in the real config (withFileMutationQueue, truncation defaults), so behavior is preserved — confirmed in pi's edit.js. Observation is fail-safe (emit in try/catch). toolLabel + instrumentToolDefinition are pure/unit-tested (label mapping; wrap delegates same args + result; observer error can't break a tool call). Caveat: the customTools/noTools runtime wiring isn't covered by tests (they stub createSession, bypassing buildSessionOptions) — needs a real cook run to confirm the agent receives the instrumented tools and they emit live. check + pi-actions tests green. Co-Authored-By: Claude <noreply@anthropic.com>

Brownfield cook provisioned every slice's git worktree eagerly in wireHandlers — N `git worktree add` + N recursive node_modules CoW copies paid synchronously at startup before any slice fired. - Move slice-worktree creation into resolveSliceCwd via idempotent ensureSliceWorktree, so a slice's worktree is materialized on first fire. A run touching 2 of 8 slices pays for 2 worktrees, not 8. Synchronous provisioning serializes concurrent fires on the JS thread, so parallel-policy worktree adds never overlap. - Symlink each slice's node_modules to the parent worktree's single copy instead of CoW-copying per slice (SHAREABLE_TOP_LEVEL_ENTRIES). walkFiles already skips symlinks, so the shared tree is never re-walked during dep seeding, merge, or promotion. Other gitignored dirs still copy per slice. Correctness-neutral: same worktrees/branches, just lazy; deps resolve through the symlink. npm run verify green; adds symlink + idempotency unit tests. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

`brunch serve --land` opt-in: after a brownfield cook promotes to cook/<runId>, merge that branch into the repo's checked-out branch as serve's final step. landCookBranch refuses on a dirty tree / detached HEAD and aborts on conflict — the cook branch is always left intact. Plain `cook` and default `serve` are unchanged, so FE-877's "never freelance into the working branch" invariant holds unless opted in. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The interviewer (interview.ts) and the three cook agent actions (write-tests / write-code / verify-epic in pi-actions.ts) move from claude-opus-4-6 to claude-opus-4-8. Secondary-chat and the plan-architect default are left on 4.6 intentionally. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Opus 4.8 rejects thinking.type=enabled ("Use thinking.type.adaptive and output_config.effort"). Swap the interviewer's enabled/budgetTokens block for { type: 'adaptive' } + effort: 'medium'. Only interview.ts sets explicit thinking provider options; secondary-chat and plan-architect don't. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The grid made the happy path legible; this does the unhappy path. - events: slice + cook-done gain `reason?`. - pi-actions: failed-slice emits carry a reason — evaluate-done maps failureKind ('infra error' / 'tests failed'); write-tests/write-code catches use 'test authoring failed' / 'code authoring failed'. - cook-cli: cook-done carries result.reason on halt. - run-store: slice rows store `reason`; RunState.haltReason set from cook-done(ok:false), unset on completion. - ink: a failed slice row shows its reason (red); a HaltSummary footer pins "✗ halted · <reason>" (truncated) when the run halts. Retry-attempt counts ("3/3") DEFERRED — retryCount lives on the petri-net tokens, not the action context; that's a separate slice. Verified in an isolated worktree off the TUI branch: check clean; presenter + pi-actions + cook tests green (141). The one full-suite failure is build-boundary.test (client production-chunk manifest) — unrelated to this server-only change; a worktree/symlinked-node_modules artifact, re-verify on a normal checkout. Co-Authored-By: Claude <noreply@anthropic.com>

Completes the retry half of failure legibility — without net-token plumbing. A slice loops failed→running on retry, so the attempt count is derivable from the slice-event stream the store already folds: queued→running is attempt 1, a step change mid-run keeps it, failed→running bumps it. The grid shows "attempt N" only for N≥2 (no clutter on the first run). No engine/net changes: retryCount on the petri tokens isn't needed — the observable slice transitions carry the same information. check + presenter tests green (46). Co-Authored-By: Claude <noreply@anthropic.com>

The attempt count now reads "attempt 2/4" — n over the total attempts allowed (retry budget + 1). cook-cli emits maxRetries on run-shape; the store derives maxAttempts = maxRetries + 1; the grid shows it only once a slice has retried (n ≥ 2), and falls back to a bare "attempt n" when the budget is unknown. Denominator confirmed against the net: retryCount halts at `>= maxRetries` (starts 0), so a slice gets maxRetries + 1 total attempts. check + presenter tests green (49). Co-Authored-By: Claude <noreply@anthropic.com>

Opus 4.8 otherwise splits a single `ask_question` into several parallel partial calls (one option each, or leaked tool-call XML), which land as `output-error` parts. - interview.ts: set `disableParallelToolUse` so the model emits one tool call per step. - app.ts: `stripFailedToolParts` drops echoed `output-error` tool parts before `validateUIMessages`. They carry `rawInput` but no `input`, so the output-error schema rejects them and bricks the whole follow-up turn; the retried call is what matters, the dead attempt carries no history value. - app.test.ts: covers a follow-up turn whose history contains a failed attempt followed by a successful retry. verify green (check + test + build). 🍳 Built with brunch Co-Authored-By: Opus 4.8 <noreply@anthropic.com>

Evaluate gate (phantom failure fix): - Add `absent` TestFailureKind for vitest "No test files found" (nothing built yet), distinct from a genuine `test` red ("Cannot find module" stays `test`). - runVerification aggregate precedence infra > test > absent. - evaluate-done no longer emits `failed` for the unbuilt greenfield gate — the slice stays running and flows to write-tests as the SAME attempt, killing the phantom n/max + ✗ NEEDS WORK on every clean slice. - Strip `[agent-tail]` harness noise from captured runner output. Promotion ref namespace + composer: - run-refs.ts: centralize the run git-ref namespace (brunch/run/<id>, brunch/slice/<id>/<sid>), replacing duplicated cook/ + cook-slice/ strings; prune stale worktree registrations before `git worktree add`. On-disk .brunch/cook/runs/ path unchanged (FE-743 compat). - run-artifact.ts: fold per-slice branches into the run branch via `git merge-tree` plumbing in dependency order — real conflicts surface fail-closed, one merge node per slice, `squash` granularity retained. Pure plumbing: no working tree / index / active branch touched. Not yet wired into cook-cli (needs a live-run check of the dependency-seed interaction). 🍳 Built with brunch Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>

Cook runs write per-slice worktrees under .brunch/cook/runs/<id>/worktree/, which Vite's default file watcher picks up — every slice edit triggered a page reload and tsconfig-change full reload in the host dev server. - vite-client.ts: server.watch.ignored = ['**/.brunch/**']. Merged with Vite's built-in ignores (.git, node_modules, cacheDir), so run artifacts no longer drive HMR. 🍳 Built with brunch Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>

project-detect.test.ts had the "resolves the runner from workspace packages in a monorepo" describe block verbatim twice — remove the copy. 🍳 Built with brunch Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 4 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit f96b365. Configure here.}

cursor · 2026-06-17T16:37:42Z

+    const relocated = withTestDir(PROFILES['node-vitest'].toolchain, 'src');
+    expect(relocated.testCommand('src/x.test.ts')).toEqual(['npx', 'vitest', 'run', 'src/x.test.ts']);
+  });
+});


Duplicated withTestDir test block

Low Severity

The same describe('withTestDir relocates test targets…') suite appears twice in project-profile.test.ts, with identical cases duplicated verbatim starting at line 168.

^{Reviewed by Cursor Bugbot for commit f96b365. Configure here.}

Greenfield serves accepted --land but never landed: landing only runs in the brownfield (codebase) promotion path. Greenfield has no repo branch to land onto, so surface the no-op instead of silently swallowing the flag. 🍳 Built with brunch Co-Authored-By: Opus 4.8 <noreply@anthropic.com>

write-tests and write-code emit a failed slice event before rethrowing on runPi failure; verify-epic did not, so epic-verification failures left the slice grid without a failed row or reason. Mirror the wrapper on the epic's representative slice. 🍳 Built with brunch Co-Authored-By: Opus 4.8 <noreply@anthropic.com>

mergeSlicesIntoEpicSandbox builds __epic__ trees via walkFiles, which skips symlinks, so the parent's symlinked node_modules never reached the merged cwd. verify-epic and probes then ran without resolvable deps even though per-slice verification passed. Re-link the shareable entries after the merge. 🍳 Built with brunch Co-Authored-By: Opus 4.8 <noreply@anthropic.com>

The `withTestDir relocates test targets…` describe block was copied verbatim a second time. Remove the duplicate. 🍳 Built with brunch Co-Authored-By: Opus 4.8 <noreply@anthropic.com>

Cook actions (write-tests/write-code/verify-epic) and the interview already run on claude-opus-4-8; the architect default lagged at claude-opus-4-6. Lift it to 4-8 so plan, cook, and interview paths share the latest Opus. 🍳 Built with brunch Co-Authored-By: Opus 4.8 <noreply@anthropic.com>

… detail Three serve/plan presentation fixes the dogfood run surfaced: - Recipe banner: the header said "brunch plan" / "✓ plan" while the phase tracker right below it reads "recipe". Rename both to "recipe" so the planning surface speaks one brigade term (the CLI command stays `plan`; this is the phase label). - out folder: plan-start showed the launch root (e.g. the repo dir). Emit the spec plan dir (.brunch/cook/specs/<id>) — the exact place plan.yaml lands. - Verbose warnings: --verbose now appends a one-sentence plain-English account after each terse warning code (explainEmitterWarning / explainReconciliationWarning), so a reviewer needn't know the code vocabulary. Default verbosity keeps the terse line. Also fix a pre-existing typecheck break in run-artifact.test.ts (mode 'codebase' → 'greenfield'; PlanMode is greenfield|brownfield) that was reddening the gate. 🍳 Built with brunch Co-Authored-By: claude-opus-4-8 <noreply@anthropic.com>

kostandinang mentioned this pull request Jun 16, 2026

FE-879: Lazy per-slice cook worktrees and shared node_modules for brownfield #223

Open

kostandinang changed the title ~~FE-864: raise pi action timeout to 600s~~ FE-864: Raise pi-action timeout to 600s Jun 16, 2026

kostandinang force-pushed the ka/fe-878-brunch-serve branch from 90bb5ef to 40a9d88 Compare June 16, 2026 18:05

kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from 5b145b2 to 7d39fe7 Compare June 16, 2026 18:05

kostandinang changed the title ~~FE-864: Raise pi-action timeout to 600s~~ FE-864: Orchestrator improvements umbrella — brownfield feature delivery from spec Jun 16, 2026

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/orchestrator/src/epic-sandbox-merge.ts

Comment thread src/orchestrator/src/epic-sandbox-merge.ts

kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from a5bfc10 to ac4e47c Compare June 16, 2026 23:45

kostandinang force-pushed the ka/fe-878-brunch-serve branch from 40a9d88 to 05b471a Compare June 16, 2026 23:45

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/orchestrator/src/pi-actions.ts Outdated

kostandinang force-pushed the ka/fe-878-brunch-serve branch from 05b471a to 89e7850 Compare June 16, 2026 23:55

kostandinang force-pushed the ka/fe-864-pi-timeout-600s branch from ac4e47c to 3e02bfb Compare June 16, 2026 23:55

cursor Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/orchestrator/src/net-compiler.ts

Comment thread src/orchestrator/src/presenter/run-store.ts

kostandinang changed the title ~~FE-864: Orchestrator improvements umbrella — brownfield feature delivery from spec~~ FE-864: Orchestrator improvements Jun 17, 2026

kostandinang and others added 6 commits June 17, 2026 09:36